Toward error-bounded algorithms for infinite-horizon DEC-POMDPs
نویسندگان
چکیده
Over the past few years, attempts to scale up infinite-horizon DECPOMDPs with discounted rewards are mainly due to approximate algorithms, but without the theoretical guarantees of their exact counterparts. In contrast, ε-optimal methods have only theoretical significance but are not efficient in practice. In this paper, we introduce an algorithmic framework (β-PI) that exploits the scalability of the former while preserving the theoretical properties of the latter. We build upon β-PI a family of approximate algorithms that can find (provably) error-bounded solutions in reasonable time. Among this family, H-PI uses a branch-and-bound search method that computes a near-optimal solution over distributions over histories experienced by the agents. These distributions often lie near structured, low-dimensional subspace embedded in the high-dimensional sufficient statistic. By planning only on this subspace, H-PI successfully solves all tested benchmarks, outperforming standard algorithms, both in solution time and policy quality.
منابع مشابه
Bounded Dynamic Programming for Decentralized POMDPs
Solving decentralized POMDPs (DEC-POMDPs) optimally is a very hard problem. As a result, several approximate algorithms have been developed, but these do not have satisfactory error bounds. In this paper, we first discuss optimal dynamic programming and some approximate finite horizon DEC-POMDP algorithms. We then present a bounded dynamic programming algorithm. Given a problem and an error bou...
متن کاملProducing efficient error-bounded solutions for transition independent decentralized mdps
There has been substantial progress on algorithms for single-agent sequential decision making using partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds and fast convergence rates. Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading ...
متن کاملError-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs
We address decentralized stochastic control problems represented as decentralized partially observable Markov decision processes (Dec-POMDPs). This formalism provides a general model for decision-making under uncertainty in cooperative, decentralized settings, but the worst-case complexity makes it difficult to solve optimally (NEXP-complete). Recent advances suggest recasting Dec-POMDPs into c...
متن کاملEfficient Planning for Factored Infinite-Horizon DEC-POMDPs
Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is r...
متن کاملParallel Rollout for Online Solution of Dec-POMDPs
A major research challenge is presented by scalability of algorithms for solving decentralized POMDPs because of their double exponential worst-case complexity for finite horizon problems. First algorithms have only been able to solve very small instances on very small horizons. One exception is the Memory-Bounded Dynamic Programming algorithm – an approximation technique that has proved effici...
متن کامل